Goto

Collaborating Authors

 newton algorithm





On Quadratic Convergence of DC Proximal Newton Algorithm in Nonconvex Sparse Learning

Xingguo Li, Lin Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao

Neural Information Processing Systems

We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal newton algorithm with multi-stage convex relaxation based on the difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of sparse modeling structures (i.e., local restricted strong convexity and Hessian smoothness), we prove that within each stage of convex relaxation, our proposed algorithm achieves (local) quadratic convergence, and eventually obtains a sparse approximate local optimum with optimal statistical properties after only a few convex relaxations. Numerical experiments are provided to support our theory.


Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

Godichon-Baggioni, Antoine, Lu, Wei, Portier, Bruno

arXiv.org Machine Learning

This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is introduced. This approach enables to drastically reduces computational complexity. Above all, it allows to develop universal stochastic Newton methods and investigate the asymptotic efficiency of the proposed approach. This work so expands the application scope of secondorder algorithms in stochastic optimization.


Online stochastic Newton methods for estimating the geometric median and applications

Godichon-Baggioni, Antoine, Lu, Wei

arXiv.org Machine Learning

In the context of large samples, a small number of individuals might spoil basic statistical indicators like the mean. It is difficult to detect automatically these atypical individuals, and an alternative strategy is using robust approaches. This paper focuses on estimating the geometric median of a random variable, which is a robust indicator of central tendency. In order to deal with large samples of data arriving sequentially, online stochastic Newton algorithms for estimating the geometric median are introduced and we give their rates of convergence. Since estimates of the median and those of the Hessian matrix can be recursively updated, we also determine confidences intervals of the median in any designated direction and perform online statistical tests.


Non asymptotic analysis of Adaptive stochastic gradient algorithms and applications

Godichon-Baggioni, Antoine, Tarrago, Pierre

arXiv.org Machine Learning

In stochastic optimization, a common tool to deal sequentially with large sample is to consider the well-known stochastic gradient algorithm. Nevertheless, since the stepsequence is the same for each direction, this can lead to bad results in practice in case of ill-conditionned problem. To overcome this, adaptive gradient algorithms such that Adagrad or Stochastic Newton algorithms should be prefered. This paper is devoted to the non asymptotic analyis of these adaptive gradient algorithms for strongly convex objective. All the theoretical results will be adapted to linear regression and regularized generalized linear model for both Adagrad and Stochastic Newton algorithms.


Differentially private inference via noisy optimization

Avella-Medina, Marco, Bradshaw, Casey, Loh, Po-Ling

arXiv.org Machine Learning

Over the last decade, differential privacy has evolved from a rigorous paradigm derived by theoretical computer scientists for releasing sensitive data to a technology deployed at scale in numerous applications [Ding et al., 2017, Erlingsson et al., 2014, Garfinkel et al., 2019, Tang et al., 2017]. The setting assumes the existence of a trusted curator who holds the data of individuals in a database, and the goal of privacy is to simultaneously protect individual data while allowing statistical analysis of the aggregate database. Such protection is guaranteed by differential privacy in the context of a remote access query system, where a statistician can only indirectly access the data, e.g., by obtaining noisy summary statistics or outputs of a model. Injecting noise before releasing information to the statistician is essential for preserving privacy, and the noise should be as small as possible in order to optimize statistical performance of the released statistics. In this paper, we consider the problem of estimation and inference for M-estimators. Inspired by the work of Bassily et al. [2014], Lee and Kifer [2018], Song et al. [2013], and Feldman et al. [2020], among others, we propose noisy optimization procedures that output differentially private counterparts of standard M-estimators. The central idea of these methods is to add noise to every iterate of a gradient-based optimization routine in a way that causes each iterate to satisfy a targeted differential privacy guarantee.


Generalized Matrix Factorization

Kidziński, Łukasz, Hui, Francis K. C., Warton, David I., Hastie, Trevor

arXiv.org Machine Learning

Unmeasured or latent variables are often the cause of correlations between multivariate measurements and are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVM) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to such high-volume, high-dimensional datasets. We approximate the likelihood using penalized quasi-likelihood and use a Newton method and Fisher scoring to learn the model parameters. Our method greatly reduces the computation time and can be easily parallelized, enabling factorization at unprecedented scale using commodity hardware. We illustrate application of our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit, finding that most of the variability can be explained with a handful of factors.


On Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Sparse Learning in High Dimensions

Li, Xingguo, Yang, Lin F., Ge, Jason, Haupt, Jarvis, Zhang, Tong, Zhao, Tuo

arXiv.org Machine Learning

We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal Newton algorithm with multi-stage convex relaxation based on the difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of sparse modeling structures/assumptions (i.e., local restricted strong convexity and Hessian smoothness), we prove that within each stage of convex relaxation, our proposed algorithm achieves (local) quadratic convergence, and eventually obtains a sparse approximate local optimum with optimal statistical properties after only a few convex relaxations. Numerical experiments are provided to support our theory.